Mixture model averaging for clustering

نویسندگان

  • Yuhong Wei
  • Paul D. McNicholas
چکیده

Mixture Model Averaging for Clustering Yuhong Wei University of Guelph, 2012 Advisor: Dr. Paul D. McNicholas Model-based clustering is based on a finite mixture of distributions, where each mixture component corresponds to a different group, cluster, subpopulation, or part thereof. Gaussian mixture distributions are most often used. Criteria commonly used in choosing the number of components in a finite mixture model include the Akaike information criterion, Bayesian information criterion, and the integrated completed likelihood. The best model is taken to be the one with highest (or lowest) value of a given criterion. This approach is not reasonable because it is practically impossible to decide what to do when the difference between the best values of two models under such a criterion is ‘small’. Furthermore, it is not clear how such values should be calibrated in different situations with respect to sample size and random variables in the model, nor does it take into account the magnitude of the likelihood. It is, therefore, worthwhile considering a model-averaging approach. We consider an averaging of the top M mixture models and consider applications in clustering and classification. In the course of model averaging, the top M models often have different numbers of mixture components. Therefore, we propose a method of merging Gaussian mixture components in order to get the same number of clusters for the top M models. The idea is to list all the combinations of components for merging, and then choose the combination corresponding to the biggest adjusted Rand index (ARI) with the “reference model”. A weight is defined to quantify the importance of each model. The effectiveness of mixture model averaging for clustering is proved by simulated data and real data under the pgmm package, where the ARI from mixture model averaging for clustering are greater than the one of corresponding best model. The attractive feature of mixture model averaging is its computationally efficiency; it only uses the conditional membership probabilities. Herein, Gaussian mixture models are used but the approach could be applied effectively without modification to other mixture models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Gene Expression Profiles Using Mixture Model Ensemble Averaging Approach

Clustering has been an important tool for extracting underlying gene expression patterns from massive microarray data. However, most of the existing clustering methods cannot automatically separate noise genes, including scattered, singleton and mini-cluster genes, from other genes. Inclusion of noise genes into regular clustering processes can impede identification of gene expression patterns....

متن کامل

Partial Volume Segmentation of Cerebral MRI Scans with Mixture Model Clustering

A mixture model clustering algorithm is presented for robust MRI brain image segmentation in the presence of partial volume averaging. The method uses additional classes to represent partial volume voxels of mixed tissue type in the data with their probability distributions modeled accordingly. The image model also allows for tissue-dependent variance values and voxel neighborhood information i...

متن کامل

Bayesian Model-Averaging in Unsupervised Learning From Microarray Data

Unsupervised identification of patterns in microarray data has been a productive approach to uncovering relationships between genes and the biological process in which they are involved. Traditional model-based clustering approaches as well as some recently developed model-based mining approaches for integrating genomic and functional genomic data rely on one’s ability to determine the correct ...

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

On Model-Based Clustering, Classification, and Discriminant Analysis

The use of mixture models for clustering and classification has burgeoned into an important subfield of multivariate analysis. These approaches have been around for a half-century or so, with significant activity in the area over the past decade. The primary focus of this paper is to review work in model-based clustering, classification, and discriminant analysis, with particular attenti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Adv. Data Analysis and Classification

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2015